Distributed Graph Clustering and Sparsification
نویسندگان
چکیده
Graph clustering is a fundamental computational problem with a number of applications in algorithm design, machine learning, data mining, and analysis of social networks. Over the past decades, researchers have proposed a number of algorithmic design methods for graph clustering. Most of these methods, however, are based on complicated spectral techniques or convex optimisation, and cannot be directly applied for clustering many networks that occur in practice, whose information is often collected on different sites. Designing a simple and distributed clustering algorithm is of great interest, and has wide applications for processing big datasets. In this paper we present a simple and distributed algorithm for graph clustering: for a wide class of graphs that are characterised by a strong cluster-structure, our algorithm finishes in a poly-logarithmic number of rounds, and recovers a partition of the graph close to optimal. One of the main components behind our algorithm is a sampling scheme that, given a dense graph as input, produces a sparse subgraph that provably preserves the clusterstructure of the input. Compared with previous sparsification algorithms that require Laplacian solvers or involve combinatorial constructions, this component is easy to implement in a distributed way and runs fast in practice.
منابع مشابه
Uncertain Graph Sparsification
Uncertain graphs are prevalent in several applications including communications systems, biological databases and social networks. The ever increasing size of the underlying data renders both graph storage and query processing extremely expensive. Sparsification has often been used to reduce the size of deterministic graphs by maintaining only the important edges. However, adaptation of determi...
متن کاملTowards Scalable Spectral Clustering via Spectrum-Preserving Sparsification
The eigendeomposition of nearest-neighbor (NN) graph Laplacian matrices is the main computational bottleneck in spectral clustering. In this work, we introduce a highly-scalable, spectrum-preserving graph sparsification algorithm that enables to build ultra-sparse NN (u-NN) graphs with guaranteed preservation of the original graph spectrums, such as the first few eigenvectors of the original gr...
متن کاملLecture 04 / 13 : Graph Sampling and Sparsification
In previous lectures, we discussed various topics on graph: sparsest cut, clustering which finds small k-cuts, max flow, multicommodity flow, etc. For very large graphs, we’d like to ask “instead of regular approximation algorithms which run in polynomial time, can we find any near-linear time algorithm for minimum cut or maximum flow problems?”. A simple idea is for a given graph G, to solve t...
متن کاملSingle pass graph sparsification in distributed stream processing
We give a distributed one pass streaming algorithm for graph sparsification. Besides producing a sparsifier, our algorithm maintains a hierarchy of UNION-FIND data structures in a distributed manner that efficiently support queries of strong connectivities between pairs of vertices. An important component of the algorithm is an implementation of UNION-FIND queries over an Active Distributed Has...
متن کاملDistributed-Memory Parallel Algorithms for Counting and Listing Triangles in Big Graphs
Big graphs (networks) arising in numerous application areas pose significant challenges for graph analysts as these graphs grow to billions of nodes and edges and are prohibitively large to fit in the main memory. Finding the number of triangles in a graph is an important problem in the mining and analysis of graphs. In this paper, we present two efficient MPI-based distributed memory parallel ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1711.01262 شماره
صفحات -
تاریخ انتشار 2017